Redundancy-Aware Topic Modeling for Patient Record Notes

نویسندگان

  • Raphael Cohen
  • Iddo Aviram
  • Michael Elhadad
  • Noémie Elhadad
چکیده

The clinical notes in a given patient record contain much redundancy, in large part due to clinicians' documentation habit of copying from previous notes in the record and pasting into a new note. Previous work has shown that this redundancy has a negative impact on the quality of text mining and topic modeling in particular. In this paper we describe a novel variant of Latent Dirichlet Allocation (LDA) topic modeling, Red-LDA, which takes into account the inherent redundancy of patient records when modeling content of clinical notes. To assess the value of Red-LDA, we experiment with three baselines and our novel redundancy-aware topic modeling method: given a large collection of patient records, (i) apply vanilla LDA to all documents in all input records; (ii) identify and remove all redundancy by chosing a single representative document for each record as input to LDA; (iii) identify and remove all redundant paragraphs in each record, leaving partial, non-redundant documents as input to LDA; and (iv) apply Red-LDA to all documents in all input records. Both quantitative evaluation carried out through log-likelihood on held-out data and topic coherence of produced topics and qualitative assessment of topics carried out by physicians show that Red-LDA produces superior models to all three baseline strategies. This research contributes to the emerging field of understanding the characteristics of the electronic health record and how to account for them in the framework of data mining. The code for the two redundancy-elimination baselines and Red-LDA is made publicly available to the community.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correction: Redundancy-Aware Topic Modeling for Patient Record Notes

Copyright: 2014 The PLOS ONE Staff. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Figure 3. Topics learnxled by Red-LDA (top) and Vanilla LDA (bottom) on the EHR corpus. Both topics are about breast cancer (...

متن کامل

Detecting clinically relevant new information in clinical notes across specialties and settings

BACKGROUND Automated methods for identifying clinically relevant new versus redundant information in electronic health record (EHR) clinical notes is useful for clinicians and researchers involved in patient care and clinical research, respectively. We evaluated methods to automatically identify clinically relevant new information in clinical notes, and compared the quantity of redundant inform...

متن کامل

Research paper: Quantifying clinical narrative redundancy in an electronic health record

OBJECTIVE Although electronic notes have advantages compared to handwritten notes, they take longer to write and promote information redundancy in electronic health records (EHRs). We sought to quantify redundancy in clinical documentation by studying collections of physician notes in an EHR. DESIGN AND METHODS We implemented a retrospective design to gather all electronic admission, progress...

متن کامل

Structuring the Unstructured Note: Automatic Organizing and Formatting for lecture notes

“Note taking is an important practice during lectures. With the rapid development of electronic devices such as smart phones, tablets and laptops, digital note-taking has become an option for many students. In the classroom setting, note takers might fail to record clearly organized notes due to limitations of time and devices. Therefore, users often need to manually organize long paragraphs of...

متن کامل

Evaluating measures of redundancy in clinical texts.

Although information redundancy has been reported as an important problem for clinicians when using electronic health records and clinical reports, measuring redundancy in clinical text has not been extensively investigated. We evaluated several automated techniques to quantify the redundancy in clinical documents using an expert-derived reference standard consisting of outpatient clinical docu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014